-
Notifications
You must be signed in to change notification settings - Fork 649
Allow database dumps to be run against the read-only replica #2144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In production, the scheduler was configured to run: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA ``` instead of: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA_URL ``` Because of the typo in the variable name, the dump_db job has been running against the primary database. This patch modifies the fallback to ensure the primary database is only used if explicitly provided.
(rust_highfive has picked a reviewer for you, use r? to override) |
So it turns out it's actually good that the scheduler was configured incorrectly; the dump_db task can't currently run on read-only databases because it creates views :( (see crates-io-incident-response for more info) |
We will need to rethink how this script works, it cannot currently be run against the replica (and changing the scheduler config to hit it caused the job to begin erroring). The issues we need to address are:
|
This addresses the following error: ``` sql_error_code = 0A000 ERROR: cannot use serializable mode in a hot standby sql_error_code = 0A000 HINT: You can use REPEATABLE READ instead. sql_error_code = 0A000 STATEMENT: BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE; ``` Because our application does not rely on any other `SERIALIZABLE` transactions, it should be fine to relax the isoluation level for database dumps to `REPEATABLE READ`. `DEFERRABLE` is dropped because it is only meaningful when combined with `SERIALIZABLE`.
READ_ONLY_REPLICA_URL
for db_dump
Sorry again for the confusion on this stealth configuration change. I should have announced it in the appropriate channel. I've pushed 2 additional commits to avoid creating temporary views and to switch to the "REPEATABLE READ" isolation level. I've tested locally (against a normal database), but we can manually schedule a database dump against the read-only replica and then update the existing daily task if that is successful. |
Sounds good to me
…On Mon, Jan 20, 2020, 7:56 PM Justin Geibel ***@***.***> wrote:
Sorry again for the confusion on this stealth configuration change. I
should have announced it in the appropriate channel.
I've pushed 2 additional commits to avoid creating temporary views and to
switch to the "REPEATABLE READ" isolation level.
I've tested locally (against a normal database), but we can manually
schedule a database dump against the read-only replica and then update the
existing daily task if that is successful.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2144?email_source=notifications&email_token=AALVMK3SX6HWKNTDAQ527YTQ6ZP5ZA5CNFSM4KI6WKF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJOKIZY#issuecomment-576496743>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALVMK34Z5OV5UK2P2QO3E3Q6ZP5ZANCNFSM4KI6WKFQ>
.
|
@@ -1,19 +1,7 @@ | |||
BEGIN; | |||
BEGIN ISOLATION LEVEL REPEATABLE READ, READ ONLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This provides somewhat weaker consistency guarantees for the snapshot used for the database dump. The Postgres documentation explicitly recommends using SERIALIZABLE READ ONLY DEFERRABLE
for backups.
We don't seem to have a very strong case for moving the dump to the read-only replica, since it doesn't seem to have cause any problems on the main replica. Neither does it seem unreasonable to do so, even with the weaker consistency guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question – are we sure REPEATABLE READ
will work on the hot spare? It isn't obvious why SERIALIZABLE READ ONLY
would not work, so we should test this on stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure
REPEATABLE READ
will work on the hot spare?
I definitely expect this to work. We can test by deploying the code, manually scheduling an export, verifying that everything works, and then finally updating the daily task.
It isn't obvious why
SERIALIZABLE READ ONLY
would not work
This definitely was a surprise to me as well. But I hope my other response helps explain why it isn't possible for the read-replica to be directly involved in cycle tracking on the leader. I've seen several suggestions that using DEFERABLE
from a replica could be supported in future releases by writing some additional data in the WAL so that the follower can find a valid snapshot to start from. It appears no such enhancement has been implemented yet.
we should test this on stage.
We can't actually test this on staging, because Heroku only supports read-only replicas on paid database plans. But we shouldn't have any issues merging and deploying this change as we can roll it out via a configuration change to the scheduler.
We don't seem to have a very strong case for moving the dump to the read-only replica, since it doesn't seem to have cause any problems on the main replica.
I'm not entirely sure about that. I don't think any of the temporary response time spikes can be traced to the the export specifically, but I've definitely seen slight performance improvements and much less variation in response times after moving just a few expensive endpoints (#2073) to the read-only replica. In total, that PR seems to have moved only around 600 queries per hour off of the primary database, but the improvements seem impressive. It also appears to have eliminated the occasional spikes in download response times. I've collected some rough numbers and plan to follow up on that PR with more details, but I would definitely prefer to run the export from there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perfectly fine with moving to the read replica, and with the additional explanations you gave I'm actually actively in favour of doing so.
@@ -13,7 +13,7 @@ fn main() -> Result<(), Error> { | |||
match &*job { | |||
"update_downloads" => Ok(tasks::update_downloads().enqueue(&conn)?), | |||
"dump_db" => { | |||
let database_url = args.next().unwrap_or_else(|| env("DATABASE_URL")); | |||
let database_url = args.next().unwrap_or_else(|| env("READ_ONLY_REPLICA_URL")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we are passing in the replica URL on the command line in production.
Could we add something like this to .env.sample
, to make local testing easier?
export READ_ONLY_REPLICA_URL=$DATABASE_URL
(I'm not sure variable expansion works in that file.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure variable expansion works in that file.
I think that does work, but I don't recall for certain. In this case, I think I would prefer to remove the fallback entirely and add an error message suggesting to pass $DATABASE_URL
via the shell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
The error message when you try to use serializable explicitly says to use
repeatable read
…On Tue, Jan 21, 2020, 3:45 AM Sven Marnach ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/tasks/dump_db/dump-export.sql.hbs
<#2144 (comment)>:
> @@ -1,19 +1,7 @@
-BEGIN;
+BEGIN ISOLATION LEVEL REPEATABLE READ, READ ONLY;
One more question – are we sure REPEATABLE READ will work on the hot
spare? It isn't obvious why SERIALIZABLE READ ONLY would not work, so we
should test this on stage.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2144?email_source=notifications&email_token=AALVMKZQQFY7VJM7NVDD3GLQ63G4ZA5CNFSM4KI6WKF2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCSOD2VQ#discussion_r368928632>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALVMKZWQLZZTABSGLTOTTLQ63G4ZANCNFSM4KI6WKFQ>
.
|
Long response incoming because this is subtle, I've done some recent reading, and I think it's worth documenting my current mental model. Transaction IsolationThere are 3 isolation levels available in PostgreSQL:
Let's look at the final level a bit more closely.
|
@jtgeibel Thanks for taking the time for a detailed answer! I agree with your conclusion – given that we don't have any other transactions with increased isolation levels, there currently shouldn't be an observable difference. My main concern when introducing the isolation level was to make sure that at least all foreign key relations still work, so it's possible to re-import the data into the database. With |
@bors r=smarnach |
📌 Commit 4ae90e6 has been approved by |
Allow database dumps to be run against the read-only replica Edit: The description has been updated, with the original message included below. This PR updates the database dump script so that it can be run against the read-only replica database. I've tested locally, but we can manually schedule a database dump against the read-only replica and then update the existing daily scheduled task. ## Original PR Description In production, the scheduler was configured to run: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA ``` instead of: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA_URL ``` Because of the typo in the variable name, the dump_db job has been running against the primary database. This patch modifies the fallback to ensure the primary database is only used if explicitly provided.
☀️ Test successful - checks-travis |
Edit: The description has been updated, with the original message included below.
This PR updates the database dump script so that it can be run against the read-only replica database.
I've tested locally, but we can manually schedule a database dump against the read-only replica and then update the existing daily scheduled task.
Original PR Description
In production, the scheduler was configured to run:
instead of:
Because of the typo in the variable name, the dump_db job has been
running against the primary database. This patch modifies the fallback
to ensure the primary database is only used if explicitly provided.